1. The introduction about MetaCycle

1.1. General description about MetaCycle

The MetaCycle package is mainly used for detecting rhythmic signals from large scale time-series data. It incorporates ARSER(ARS), JTK_CYCLE(JTK), and Lomb-Scargle(LS) properly for periodic signal detection, and could also output integrated analysis results if required.

1.2. Types of time-series data

The usual time-series data is evenly sampled once at each time point, and the interval between neighbour time points is integer. Not all data are as simple as this. There are datasets with replicate samples, or with missing values, or un-evenly sampled, or sampled with a non-integer interval. Examples of these types of data are shown in the below table.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
The usual data CT0 CT4 CT8 CT12 CT16 CT20
With missing value CT0 NA CT8 CT12 CT16 CT20
With replicates CT0 CT0 CT8 CT8 CT16 CT16
With un-even interval CT0 CT2 CT8 CT10 CT16 CT20
With non-integer interval CT0 CT4.5 CT9 CT13.5 CT18 CT22.5

Of course, some datasets may seem combination of two or more of above types of data.

Data Type Point 1 Point 2 Point 3 Point 4 Point 5 Point 6
With replicates and missing value CT0 CT0 CT8 NA CT16 CT16
With un-even interval and replicates CT0 CT2 CT2 CT10 CT16 CT20

1.3. Method selection

The meta2d function in MetaCycle is designed to analyze differnt types of time-series datasets, and it could automatically select proper method to analyze different types of input datasets. The implementation strategy used for meta2d is shown in the flow chart.

1.4. Integration strategies

In addition to selecting proper methods to analyze different kinds of datasets, meta2d could also output integrated results from multiple methods. Detail explaination about integration stragegies is in the vignettes of MetaCycle.

  • Pvalue
  • Period and phase
    • The integrated period is an arithmetic mean value of multiple periods, while phase integration based on mean of circular quantities is implemented in meta2d.
  • Amplitude calculation
    • meta2d recalculates the amplitude with following model: \[Y_i = B + TRE*(t_i - \frac{\sum_{i=1}^n t_i}{n}) + A*cos(2*\pi*\frac{t_i - PHA}{PER})\]

    • B is baseline level of the time-series profile; TRE is trend level of the time-series profile; A is the amplitude of the waveform. PHA and PER are integrated period and phase mentioned above.
    • The baseline and trend level are explained in the below example.

  • About rAMP
    • meta2d outputs a relative amplitude value (rAMP), which normalizes amplitude values with expression levels. For example, Ugt2b34 has a larger amplitude than Arntl, but its rAMP is smaller than Arntl.

1.5. Description about output values

Column name Description Column name Description
ARS_pvalue pvalue from ARS LS_BH.Q FDR from LS
ARS_BH.Q FDR from ARS LS_period period from LS
ARS_period period from ARS LS_adjphase adjusted phase from LS
ARS_adjphase adjusted phase from ARS LS_amplitude amplitude from LS
ARS_amplitude amplitude from ARS meta2d_pvalue integrated pvalue
JTK_pvalue pvalue from JTK meta2d_BH.Q FDR based on integrated pvalue
JTK_BH.Q FDR from JTK meta2d_period averaged period
JTK_period period from JTK meta2d_phase integrated phase
JTK_adjphase adjusted phase from JTK meta2d_Base baseline value given by meta2d
JTK_amplitude amplitude from JTK meta2d_AMP amplitude given by meta2d
LS_pvalue pvalue from LS meta2d_rAMP relative amplitude

2. Take a quick check about installed software and packages

# load 'shiny' package
library("shiny")
# load 'MetaCycle' package
library("MetaCycle")
# load 'dplyr' package
library("dplyr")
# load 'ggplot2' package
library("ggplot2")
# load 'cowplot' package
library("cowplot")
# change working directory to the desktop 
setwd("~/Desktop")

# check required directories under the desktop
dir()

3. The demo pipeline about MetaCycle

3.1. Open the web application of meta2d

  • Option 1: use ‘runGitHub’ function by typing below command in the ‘Console’ window of RStudio (this option requires that the laptop is connected with internet)
# load shiny package
library("shiny")

# use 'runGitHub' function of shiny package
runGitHub("MetaCycleApp", "gangwug")
  • Option 2: use ‘runApp’ function by typing below command in the ‘Console’ window of RStudio (this option does not require that the laptop is connected with internet)
# load shiny package
library("shiny")

# use 'runApp' function of shiny package
runApp("~/Desktop/MetaCycleApp-master")
  • If the web application is successfully opened, it will show a window like below:

3.2. Analyze three cases with the web application of meta2d

  • caseA (csv): 1) evenly sampled with 4h interval covering two days (from CT18 to CT62), 2) no replicate or missing value, 3) sampling time information is numeric value in the header

    • goal: use JTK_CYCLE to detect circadian transcripts with MetaCycleApp
    • step 1: click ‘Choose File’ button to upload caseA dataset from ‘data’ directory
    • step 2: set the parameters as shown below (set ‘minper’ and ‘maxper’ to 24 and select ‘JTK’ from ‘cycMethod’) and click ‘Run’ button
    • step 3: take a quick check of the analysis results
    • step 4: click ‘download’ button at bottom to download the outputs to ‘result’ directory and name the output file as ‘meta2d_caseA.csv’
  • caseB (txt): 1) evenly sampled with 6h interval covering two days (from CT20 to CT62), 2) no replicate 3) one missing time point at CT50, 4) sampling time information is non-numeric character in the header

    • goal: use JTK_CYCLE and Lomb-Scargle to detect circadian transcripts with MetaCycleApp
    • detail steps: 1) follow caseA to upload caseB dataset from ‘data’ directory, 2) parameters setting: select ‘txt’ under fileStyle, paste comma-delimited time values under ‘timepoints’, set ‘minper’ and ‘maxper’ to 24, and select ‘JTK’ and ‘LS’ under ‘cycMethod’, 3) click ‘Run’ button, 4) click ‘Download’ button to save outputs (named as ‘meta2d_caseB.txt’) to ‘result’ directory, 5) you could see a window like below during analyzing caseB dataset
  • caseC (csv): 1) cell cycle data from yeast, 2) evenly sampled with 16min interval from 2min to 162min, 3) no replicate or missing value, 4) the expected time length for cell cycle of this kind of yeast is 85min, 5) sampling time information is numeric value in the header

    • goal: use ARSER, JTK_CYCLE and Lomb-Scargle to detect cycling transcripts from yeast cell cycle dataset
    • detail steps: 1) follow above cases to upload caseC dataset from ‘data’ directory, 2) parameters setting: select ‘csv’ under fileStyle, type ‘Header’ under ‘timepoints’, set ‘minper’ as 80 and ‘maxper’ as 96, set ‘ARSdefaultPer’ as 85, and select all three methods under ‘cycMethod’, 3) click ‘Run’ button, 4) click ‘Download’ button to save outputs (named as ‘meta2d_caseC.csv’) to ‘result’ directory, 5) you could see a window like below during analyzing caseC dataset

3.3. Please analyze five exercise datasets with MetaCycleApp according to instructions mentioned above

  • general advice: 1) five exercise datasets (with ‘exercise’ in the file name) could be found in ‘data’ directory, 2) select all three methods under ‘cycMethod’ and MetaCycleApp will automaticlly select methods that could be used to analyze input dataset, 3) for search circadian profiles, it is better to set both ‘minper’ and ‘maxper’ as 24, 4) select ‘csv’ or ‘txt’ under ‘fileStyle’ according to input file, 5) sampling time information is non-numeric character in the header of exerciseA, please type comma-deliated time values under ‘timepoints’; for other exercise datasets, type ‘Header’ under ‘timepoints’, 6) save the outputs in ‘result’ directory with different names
  • exerciseA (txt): 1) evenly sampled with 4h interval covering one day (from CT19 to CT39), 2) no replicate or missing value, 3) sampling time information is non-numeric character in the header
  • exerciseB (csv): 1) evenly sampled with 4h interval covering two days, 2) three replicates at each time point, 3) no missing value, 4) sampling time information is numeric value in the header
  • exerciseC (csv): 1) evenly sampled with 4h interval covering two days, 2) varied number of replicates at each time point, 3) no missing value, 4) sampling time information is numeric value in the header
  • exerciseD (csv): 1) evenly sampled with 3h interval covering two days (from CT0 to CT45), 2) three replicates at each time point 3) there is random missing value in some rows, 4) sampling time information is numeric value in the header
  • exerciseE (csv): 1) un-evenly sampled from CT19 to CT57, 2) no replicates or missing value, 3) sampling time information is numeric value in the header

3.4. Analyze the experimental data (it may take several minutes)

  • experimentA: 1) evenly sampled with 2h interval covering two days (from CT18 to CT64), 2) no replicate or missing value, 3) sampling time information is numeric value in the header, 4) including 10K probesets (about 1/4 of total probesets for MOE4302 array)
  • general advice: 1) select ‘JTK’ and ‘LS’ in analyzing high-resolution datasets (eg. 2h / 2days, ‘ARS’ shows relative higher false positive rates in analyzing these kinds of datasets), 2) set ‘minper’ and ‘maxper’ both as ‘24’ for searching circadian transcripts, 3) save outputs in ‘result’ directory and name it as ‘meta2d_experimentA.csv’, 4) if following above advice, you could see a window like below during analyzing this dataset

3.5. Check the expression profiles of circadian transcripts (FDR < 0.05) and show their phase distribution

  • Type below command in the ‘Console’ window of RStudio to prepare the dataframe (‘figureD’) used for drawing figures

# load dplyr package
library("dplyr")

# read in the 'meta2d_experientA.csv' file
dataD <- read.csv("~/Desktop/SRBR_SMTSAworkshop-master/result/meta2d_experimentA.csv", stringsAsFactors = FALSE)
# take a look at column names of the data
colnames(dataD)
##  [1] "CycID"         "JTK_pvalue"    "JTK_BH.Q"      "JTK_period"   
##  [5] "JTK_adjphase"  "JTK_amplitude" "LS_pvalue"     "LS_BH.Q"      
##  [9] "LS_period"     "LS_adjphase"   "LS_amplitude"  "meta2d_pvalue"
## [13] "meta2d_BH.Q"   "meta2d_period" "meta2d_phase"  "meta2d_Base"  
## [17] "meta2d_AMP"    "meta2d_rAMP"   "X18"           "X20"          
## [21] "X22"           "X24"           "X26"           "X28"          
## [25] "X30"           "X32"           "X34"           "X36"          
## [29] "X38"           "X40"           "X42"           "X44"          
## [33] "X46"           "X48"           "X50"           "X52"          
## [37] "X54"           "X56"           "X58"           "X60"          
## [41] "X62"           "X64"
# filter the data by meta2d_BH.Q < 0.05 by 'filter' function of dplyr
cirD <- filter(dataD, meta2d_BH.Q < 0.05)
# see the number of circadian transcripts
nrow(cirD)
## [1] 480
# select "CycID", "meta2d_period", "meta2d_phase", and "X18" to "X64" columns for drawing heatmap by 'select' function of dplyr package
figureD <- select(cirD, CycID, meta2d_period, meta2d_phase, num_range("X", seq(18, 64, by=2), width = 2))
  • Type below command in the ‘Console’ window of RStudio to draw a heatmap figure of circadian transcripts (if the code runs well, a heatmap figure will show in the bottom right window of the RStudio)

# load 'heatmapF' and 'phaseHist' function used to draw heatmap and histogram
source("~/Desktop/SRBR_SMTSAworkshop-master/R/fig.R")

# get the heatmap figure by 'heatmapF' function
heatmapFigure <- heatmapF(inputD = figureD, minfold=0.5, maxfold=2)

heatmapFigure

  • Type below command in the ‘Console’ window of RStudio to draw a histgram for phase distribution of circadian transcripts selected above (if the code runs well, a histogram figure will show in the bottom right window of the RStudio)

# get the phase distribtuion figure by 'phaseHist' function
histFigure <- phaseHist(inputD = figureD, binvalue=seq(0,24,by=2), histcol = "blue")

histFigure

4. The demo pipeline about PSEA

4.1. Prepare the file used for PSEA analysis

  • Type below command in the ‘Console’ window of RStudio to prepare the file used to PSEA analysis

# make sure that 'figureD' dataframe is not deleted, and select its 'CycID' and 'meta2d_phase' columns for later analysis
phaseD <- select(figureD, CycID, meta2d_phase)
# if it reports error after typing above command, please re-run the code of preparing 'figureD' in the first part of 3.5 section of this demo

# read in the annotation file-'annoDbiMainInfMouse4302.txt' in the 'data-raw' directory
annoD <- read.delim("~/Desktop/SRBR_SMTSAworkshop-master/data-raw/Mouse4302ProbeAnnotation.txt", stringsAsFactors = FALSE)
# add annotation information with 'inner_join' function of dplyr package
phaseD <- inner_join(phaseD, annoD, by=c("CycID" = "PROBEID"))
# take a look at the phaseD
head(phaseD)
##          CycID meta2d_phase ENTREZID  SYMBOL
## 1 1439401_x_at     11.74247    26932 Ppp2r5e
## 2   1456766_at     18.59238    77609 Ccdc151
## 3   1438743_at     15.34830    13122  Cyp7a1
## 4 1443849_x_at     16.22043    22275    Urod
## 5 1424118_a_at     13.08001    66442   Spc25
## 6   1416214_at     21.68141    17217    Mcm4
# select "SYMBOL" and 'meta2d_phase' columns for PSEA analysis
pseaD <- select(phaseD, SYMBOL, meta2d_phase)
# take a look at the pseaD
head(pseaD)
##    SYMBOL meta2d_phase
## 1 Ppp2r5e     11.74247
## 2 Ccdc151     18.59238
## 3  Cyp7a1     15.34830
## 4    Urod     16.22043
## 5   Spc25     13.08001
## 6    Mcm4     21.68141
# write the 'pseaD' dataframe to a txt file-'experimentPSEA.txt' in 'result' directory
write.table(pseaD, file = "~/Desktop/SRBR_SMTSAworkshop-master/result/experimentPSEA.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
  • Check the ‘result’ directory and make sure there is a file named ‘experimentPSEA.txt’

4.2. Convert the mouse gene name to its human homolog gene name with ‘MouseHumanConverter.jar’

  • Go to ‘PSEA-master’ directory and open ‘MouseHumanConverter.jar’

  • Click ‘Browse’ button under ‘STEP 1 Select input file:’ to upload the input file (‘experimentPSEA.txt’ in ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Click ‘Browse’ button under ‘STEP 2 Select output folder:’ to select the directory used to store output file (select ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Click ‘START’ button under ‘STEP 3 Generate output:’ to generate output file and ‘START’ button will change to ‘DONE’ button as shown below

  • Go to ‘result’ directory of ‘SRBR_SMTSAworkshop-master’ and you will find the output file named as ‘human_experimentPSEA.txt’. The output file will look like below

##  PPP2R5E 11.74247
##  CCDC151 18.59238
##  CYP7A1  15.34830
##  UROD    16.22043
##  SPC25  13.08001
##  MCM4    21.68141

4.3. Do PSEA analysis with ‘PSEA1.1_VectorGraphics.jar’

  • Go to ‘PSEA-master’ directory and open ‘PSEA1.1_VectorGraphics.jar’

  • Click ‘Browse’ button under ‘STEP 1 Select items file:’ to upload the input file (‘human_experimentPSEA.txt’ in ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Set corresponding parameters under ‘STEP 3 Select parameters:’ according to the sepecific study (for circadian study use the default parameter setting) as shown below

  • Click ‘Browse’ button under ‘STEP 4 Select output folder:’ to select the directory used to store output files (select ‘PSEAout’ directory within ‘result’ directory of ‘SRBR_SMTSAworkshop-master’) as shown below

  • Click ‘START’ button under ‘STEP 5 Generate output:’ to generate output files and ‘START’ button will change to ‘DONE’ button as shown below

  • Go to ‘result’ directory of ‘SRBR_SMTSAworkshop-master’ and you will find the output file named as ‘human_experimentPSEA.txt’. The output file will look like below

5. Try your data or share your experience of analyzing time-series data